{
 "cells": [
          {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table align=\"left\">\n",
    "  <td>\n",
    "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/gimseng/99-ML-Learning-Projects/blob/master/006/solution/ensemble_techniques.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
    "  </td>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ensemble Techniques\n",
    "\n",
    "Ensemble techniques are an extremely popular set of algorithms which involves combining multiple base learners to create a hybrid model. This hybrid model performs better than any of it's constituent learners. The set up of the jupyter notebook is as follows:\n",
    "\n",
    "1) Preprocessing: The data provided needs to cleaned before being fed into any classifier. The data is more or else clean from the start. However, proper precautions must be taken to avoid missing or NaN values. In this notebook, the range of values is re-calibrated to be between 0-1. Use df.fillna() to handle missing values.\n",
    "\n",
    "2) Bagging(DT/RF): Bagging is the process of creating sub-divisions of the dataset randomly and apply basis functions on them. The final result is obtained by aggregating the individual results. Both decision trees and random forests are easy basis functions as visual comprehension of dividing the data is feasible. \n",
    "\n",
    "3) AdaBoost: AdaBoost is a meta-heuristic responsible for taking multiple weak basis functions and aggregating them into a overall strong classifier. In this solution n_estimators specify the number of models being trained. The overall measure-of-goodness is a measure of minimizing the exponential loss function.\n",
    "\n",
    "4) GradientBoost: This is a generalized form of AdaBoost which provides a set of loss function to chose from.\n",
    "\n",
    "5) XGBoost: It is similar to GradientBoosting, with the inclusion of Lasso and Ridge regularization techniques.\n",
    "\n",
    "6) Setting hyperparameters: In all of the above techniques, the hyperparameters were set either as standard or after careful off-the-books consideration. However, how will we know which particular setting gives the best performance? For this, a set of parameters were appended in a list and the models were made to run on all of those combinations. A light version of Gradient Boosting (LGBM) is also included here. Cross-validation is conducted in a different way in this step, by considering all configurations in steps of 5.\n",
    "\n",
    "7) Stacking: This involves training multiple models and combining them for better results from an optimal combination of multiple basis functions. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Importing Libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#import required libraries\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "from matplotlib.colors import ListedColormap\n",
    "import numpy as np\n",
    "\n",
    "from sklearn import preprocessing\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.metrics import balanced_accuracy_score\n",
    "from sklearn.ensemble import BaggingClassifier\n",
    "from sklearn.svm import SVC\n",
    "from sklearn.tree import DecisionTreeClassifier\n",
    "from sklearn import metrics\n",
    "from sklearn.datasets import make_moons\n",
    "from sklearn.ensemble import GradientBoostingClassifier\n",
    "from sklearn.model_selection import GridSearchCV\n",
    "from sklearn.metrics import r2_score\n",
    "from sklearn.ensemble import RandomForestRegressor\n",
    "from lightgbm import LGBMRegressor\n",
    "from xgboost import XGBRegressor\n",
    "from sklearn.ensemble import AdaBoostClassifier\n",
    "from sklearn.svm import NuSVC\n",
    "from sklearn.neural_network import MLPClassifier\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from mlxtend.classifier import StackingCVClassifier\n",
    "\n",
    "from imblearn.ensemble import BalancedBaggingClassifier\n",
    "from imblearn.ensemble import BalancedRandomForestClassifier\n",
    "\n",
    "from mlxtend.regressor import StackingCVRegressor\n",
    "from sklearn.linear_model import Ridge, Lasso\n",
    "from sklearn.svm import SVR\n",
    "import xgboost as xgb\n",
    "from xgboost import XGBClassifier\n",
    "\n",
    "import itertools"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Preprocessing\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#read data\n",
    "data_url='https://raw.githubusercontent.com/gimseng/99-ML-Learning-Projects/master/006/data/breast_cancer_diagnosis.csv'\n",
    "df = pd.read_csv(data_url)\n",
    "X = df.drop(['diagnosis'], axis = 1)\n",
    "y = df['diagnosis']\n",
    "\n",
    "cols = X.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#scales data to be between 0-1\n",
    "x = X.values #returns a numpy array\n",
    "min_max_scaler = preprocessing.MinMaxScaler()\n",
    "x_scaled = min_max_scaler.fit_transform(x)\n",
    "X = pd.DataFrame(x_scaled)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X.columns = cols"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#plotting the relation between each feature with texture as anchor\n",
    "cols = list(X.columns)\n",
    "for i in cols:\n",
    "    X.plot.scatter(x = 'texture_mean', y = i)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Bagging (Decision Tree)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train, X_test, y_train, y_test = train_test_split(X, y,test_size = 0.3, random_state=50)\n",
    "#creating bagging classifier with base estimator as a decision tree\n",
    "tree = DecisionTreeClassifier()\n",
    "bag = BaggingClassifier(tree, n_estimators=100, max_samples=0.8,\n",
    "                        random_state=1)\n",
    "#training it on training data and labels\n",
    "bag.fit(X_train, y_train)\n",
    "#getting predictions\n",
    "y_pred = bag.predict(X_test)\n",
    "acc = balanced_accuracy_score(y_test, y_pred)\n",
    "\n",
    "print(\"Balanced decision tree's accuracy is : \", acc)\n",
    "\n",
    "#same process with the exception of using a balanced bagging classifier\n",
    "balanced_bag = BalancedBaggingClassifier(tree, n_estimators=100, max_samples=0.8,\n",
    "                        random_state=1)\n",
    "balanced_bag.fit(X_train,y_train)\n",
    "\n",
    "by_pred = balanced_bag.predict(X_test)\n",
    "bacc = balanced_accuracy_score(y_test,by_pred)\n",
    "\n",
    "print(\"Balanced Bagged Decision Tree Classifier's accuracy is \", bacc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Bagging (Random Forest)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#same process as before\n",
    "forest = RandomForestClassifier()\n",
    "bag = BaggingClassifier(tree, n_estimators=100, max_samples=0.8,\n",
    "                        random_state=1)\n",
    "bag.fit(X_train, y_train)\n",
    "\n",
    "\n",
    "y_pred = bag.predict(X_test)\n",
    "acc = balanced_accuracy_score(y_test, y_pred)\n",
    "\n",
    "print(\"Balanced random forest's accuracy is : \", acc)\n",
    "\n",
    "balanced_bag = BalancedBaggingClassifier(tree, n_estimators=100, max_samples=0.8,\n",
    "                        random_state=1)\n",
    "balanced_bag.fit(X_train,y_train)\n",
    "\n",
    "by_pred = balanced_bag.predict(X_test)\n",
    "bacc = balanced_accuracy_score(y_test,by_pred)\n",
    "\n",
    "print(\"Balanced Bagged Random Forest Classifier's accuracy is \", bacc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Ada Boost"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#use the make_moons dataset for easier plotting. \n",
    "X, y = make_moons(n_samples=500, noise=0.30, random_state=42)\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#this function takes in the data and it's labels and uses ListedColormap to plot the decision boundary\n",
    "#refer to matplotlib documentation for further details\n",
    "def plot_decision_boundary(clf, X, y, axes=[-1.5, 2.45, -1, 1.5], alpha=0.5, contour=True):\n",
    "    x1s = np.linspace(axes[0], axes[1], 100)\n",
    "    x2s = np.linspace(axes[2], axes[3], 100)\n",
    "    x1, x2 = np.meshgrid(x1s, x2s)\n",
    "    X_new = np.c_[x1.ravel(), x2.ravel()]\n",
    "    y_pred = clf.predict(X_new).reshape(x1.shape)\n",
    "    custom_cmap = ListedColormap(['#fafab0','#9898ff','#a0faa0'])\n",
    "    plt.contourf(x1, x2, y_pred, alpha=0.3, cmap=custom_cmap)\n",
    "    if contour:\n",
    "        custom_cmap2 = ListedColormap(['#7d7d58','#4c4c7f','#507d50'])\n",
    "        plt.contour(x1, x2, y_pred, cmap=custom_cmap2, alpha=0.8)\n",
    "    plt.plot(X[:, 0][y==0], X[:, 1][y==0], \"yo\", alpha=alpha)\n",
    "    plt.plot(X[:, 0][y==1], X[:, 1][y==1], \"bs\", alpha=alpha)\n",
    "    plt.axis(axes)\n",
    "    plt.xlabel(r\"$x_1$\", fontsize=18)\n",
    "    plt.ylabel(r\"$x_2$\", fontsize=18, rotation=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ada_clf = AdaBoostClassifier(\n",
    "    DecisionTreeClassifier(max_depth=1), n_estimators=200,\n",
    "    algorithm=\"SAMME\", learning_rate=0.5, random_state=42)\n",
    "ada_clf.fit(X_train, y_train)\n",
    "ada_pred = ada_clf.predict(X_test)\n",
    "print(\"AdaBoost's accuracy is \", metrics.accuracy_score(y_test, ada_pred))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_decision_boundary(ada_clf, X, y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m = len(X_train)\n",
    "fix, axes = plt.subplots(ncols=2, figsize=(10,4), sharey=True)\n",
    "for subplot, learning_rate in ((0, 1), (1, 0.5)):\n",
    "    sample_weights = np.ones(m)\n",
    "    plt.sca(axes[subplot])\n",
    "    for i in range(5):\n",
    "        svm_clf = SVC(kernel=\"rbf\", C=0.05, gamma=\"scale\", random_state=42)\n",
    "        svm_clf.fit(X_train, y_train, sample_weight=sample_weights)\n",
    "        y_pred = svm_clf.predict(X_train)\n",
    "        sample_weights[y_pred != y_train] *= (1 + learning_rate)\n",
    "        plot_decision_boundary(svm_clf, X, y, alpha=0.2)\n",
    "        plt.title(\"learning_rate = {}\".format(learning_rate), fontsize=16)\n",
    "    if subplot == 0:\n",
    "        plt.text(-0.7, -0.65, \"1\", fontsize=14)\n",
    "        plt.text(-0.6, -0.10, \"2\", fontsize=14)\n",
    "        plt.text(-0.5,  0.10, \"3\", fontsize=14)\n",
    "        plt.text(-0.4,  0.55, \"4\", fontsize=14)\n",
    "        plt.text(-0.3,  0.90, \"5\", fontsize=14)\n",
    "    else:\n",
    "        plt.ylabel(\"\")\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Gradient Boosting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#reverting to original dataset\n",
    "df = pd.read_csv('breast_cancer_diagnosis.csv')\n",
    "X = df.drop(['diagnosis'], axis = 1)\n",
    "y = df['diagnosis']\n",
    "\n",
    "gbrt = GradientBoostingClassifier(max_depth=10, n_estimators=3, learning_rate=0.1, random_state=42)\n",
    "gbrt.fit(X_train, y_train)\n",
    "gbrt_pred = gbrt.predict(X_test)\n",
    "preds = []\n",
    "\n",
    "print('Accuracy of Gradient Booster')\n",
    "print(metrics.accuracy_score(gbrt_pred, y_test))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# XG Boosting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "xg_reg = xgb.XGBRegressor(objective ='reg:linear', colsample_bytree = 0.3, learning_rate = 0.1,\n",
    "                max_depth = 5, alpha = 10, n_estimators = 10)\n",
    "\n",
    "xgb = XGBClassifier(n_estimators=100)\n",
    "\n",
    "xgb.fit(X_train, y_train)\n",
    "preds = xgb.predict(X_test)\n",
    "\n",
    "acc_xgb = (preds == y_test).sum().astype(float) / len(preds)*100\n",
    "\n",
    "print(\"XGBoost's prediction accuracy is: %3.2f\" % (acc_xgb))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Setting Hyperparameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "#this function uses a grid search which preprocesses the data by minimizing the MSE\n",
    "def algorithm_pipeline(X_train_data, X_test_data, y_train_data, y_test_data, \n",
    "                       model, param_grid, cv=10, scoring_fit='neg_mean_squared_error',\n",
    "                       scoring_test=r2_score, do_probabilities = False):\n",
    "    gs = GridSearchCV(\n",
    "        estimator=model,\n",
    "        param_grid=param_grid, \n",
    "        cv=cv, \n",
    "        n_jobs=-1, \n",
    "        scoring=scoring_fit,\n",
    "        verbose=2\n",
    "    )\n",
    "    fitted_model = gs.fit(X_train_data, y_train_data)\n",
    "    best_model = fitted_model.best_estimator_\n",
    "    \n",
    "    if do_probabilities:\n",
    "      pred = fitted_model.predict_proba(X_test_data)\n",
    "    else:\n",
    "      pred = fitted_model.predict(X_test_data)\n",
    "    \n",
    "    score = scoring_test(y_test_data, pred)\n",
    "    \n",
    "    return [best_model, pred, score]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "models_to_train = [XGBRegressor(), LGBMRegressor(), RandomForestRegressor()]\n",
    "#create a set of possible configurations related to the hyperparameters and iterates through them\n",
    "grid_parameters = [\n",
    "    { \n",
    "        'n_estimators': [400, 700, 1000],\n",
    "        'colsample_bytree': [0.7, 0.8],\n",
    "        'max_depth': [15,20,25],\n",
    "        'reg_alpha': [1.1, 1.2, 1.3],\n",
    "        'reg_lambda': [1.1, 1.2, 1.3],\n",
    "        'subsample': [0.7, 0.8, 0.9]\n",
    "    },\n",
    "    {\n",
    "        'n_estimators': [400, 700, 1000],\n",
    "        'learning_rate': [0.12],\n",
    "        'colsample_bytree': [0.7, 0.8],\n",
    "        'max_depth': [4],\n",
    "        'num_leaves': [10, 20],\n",
    "        'reg_alpha': [1.1, 1.2],\n",
    "        'reg_lambda': [1.1, 1.2],\n",
    "        'min_split_gain': [0.3, 0.4],\n",
    "        'subsample': [0.8, 0.9],\n",
    "        'subsample_freq': [10, 20]\n",
    "    }, \n",
    "    { \n",
    "        'max_depth':[3, 5, 10, 13], \n",
    "        'n_estimators':[100, 200, 400, 600, 900],\n",
    "        'max_features':[2, 4, 6, 8, 10]\n",
    "    }\n",
    "]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "models_preds_scores = []\n",
    "#the model's scores are noted and the best performance is displayed\n",
    "for i, model in enumerate(models_to_train):\n",
    "    params = grid_parameters[i]\n",
    "    \n",
    "    result = algorithm_pipeline(X_train, X_test, y_train, y_test, \n",
    "                                 model, params, cv=5)\n",
    "    models_preds_scores.append(result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for result in models_preds_scores:\n",
    "    print('Model: {0}, Score: {1}'.format(type(result[0]).__name__, result[2]))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Stacking"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "classifier1 = SVC(C = 50, degree = 1, gamma = \"auto\", kernel = \"rbf\", probability = True)\n",
    "\n",
    "# Initializing Multi-layer perceptron  classifier\n",
    "classifier2 = MLPClassifier(activation = \"relu\", alpha = 0.1, hidden_layer_sizes = (10,10,10),\n",
    "                            learning_rate = \"constant\", max_iter = 2000, random_state = 1000)\n",
    "\n",
    "# Initialing Nu Support Vector classifier\n",
    "classifier3 = NuSVC(degree = 1, kernel = \"rbf\", nu = 0.25, probability = True)\n",
    "\n",
    "# Initializing Random Forest classifier\n",
    "classifier4 = RandomForestClassifier(n_estimators = 500, criterion = \"gini\", max_depth = 10,\n",
    "                                     max_features = \"auto\", min_samples_leaf = 0.005,\n",
    "                                     min_samples_split = 0.005, n_jobs = -1, random_state = 1000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "sclf = StackingCVClassifier(classifiers = [classifier1, classifier2, classifier3, classifier4],\n",
    "                            shuffle = False,\n",
    "                            use_probas = True,\n",
    "                            cv = 5,\n",
    "                            meta_classifier = SVC(probability = True))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "classifiers = {\"SVC\": classifier1,\n",
    "               \"MLP\": classifier2,\n",
    "               \"NuSVC\": classifier3,\n",
    "               \"RF\": classifier4,\n",
    "               \"Stack\": sclf}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for key in classifiers:\n",
    "    # Get classifier\n",
    "    classifier = classifiers[key]\n",
    "    \n",
    "    # Fit classifier\n",
    "    classifier.fit(X_train, y_train)\n",
    "        \n",
    "    # Save fitted classifier\n",
    "    classifiers[key] = classifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = pd.DataFrame()\n",
    "for key in classifiers:\n",
    "    # Make prediction on test set\n",
    "    y_pred = classifiers[key].predict_proba(X_test)[:,1]\n",
    "    \n",
    "    # Save results in pandas dataframe object\n",
    "    results[f\"{key}\"] = y_pred\n",
    "\n",
    "# Add the test set to the results object\n",
    "results[\"Target\"] = y_test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import seaborn as sns\n",
    "\n",
    "sns.set(font_scale = 1)\n",
    "sns.set_style({\"axes.facecolor\": \"1.0\", \"axes.edgecolor\": \"0.85\", \"grid.color\": \"0.85\",\n",
    "               \"grid.linestyle\": \"-\", 'axes.labelcolor': '0.4', \"xtick.color\": \"0.4\",\n",
    "               'ytick.color': '0.4'})\n",
    "\n",
    "# Plot\n",
    "f, ax = plt.subplots(figsize=(13, 4), nrows=1, ncols = 5)\n",
    "\n",
    "for key, counter in zip(classifiers, range(5)):\n",
    "    # Get predictions\n",
    "    y_pred = results[key]\n",
    "    \n",
    "    # Get AUC\n",
    "    auc = metrics.roc_auc_score(y_test, y_pred)\n",
    "    textstr = f\"AUC: {auc:.3f}\"\n",
    "\n",
    "    # Plot false distribution\n",
    "    false_pred = results[results[\"Target\"] == 0]\n",
    "    sns.distplot(false_pred[key], hist=True, kde=False, \n",
    "                 bins=int(25), color = 'red',\n",
    "                 hist_kws={'edgecolor':'black'}, ax = ax[counter])\n",
    "    \n",
    "    # Plot true distribution\n",
    "    true_pred = results[results[\"Target\"] == 1]\n",
    "    sns.distplot(results[key], hist=True, kde=False, \n",
    "                 bins=int(25), color = 'green',\n",
    "                 hist_kws={'edgecolor':'black'}, ax = ax[counter])\n",
    "    \n",
    "    \n",
    "    # These are matplotlib.patch.Patch properties\n",
    "    props = dict(boxstyle='round', facecolor='white', alpha=0.5)\n",
    "    \n",
    "    # Place a text box in upper left in axes coords\n",
    "    ax[counter].text(0.05, 0.95, textstr, transform=ax[counter].transAxes, fontsize=14,\n",
    "                    verticalalignment = \"top\", bbox=props)\n",
    "    \n",
    "    # Set axis limits and labels\n",
    "    ax[counter].set_title(f\"{key} Distribution\")\n",
    "    ax[counter].set_xlim(0,1)\n",
    "    ax[counter].set_xlabel(\"Probability\")\n",
    "\n",
    "# Tight layout\n",
    "plt.tight_layout()\n",
    "\n",
    "# Save Figure\n",
    "plt.savefig(\"Probability Distribution for each Classifier.png\", dpi = 1080)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "params = {\"meta_classifier__kernel\": [\"linear\", \"rbf\", \"poly\"],\n",
    "          \"meta_classifier__C\": [1, 2],\n",
    "          \"meta_classifier__degree\": [3, 4, 5],\n",
    "          \"meta_classifier__probability\": [True]}\n",
    "\n",
    "\n",
    "# Initialize GridSearchCV\n",
    "grid = GridSearchCV(estimator = sclf, \n",
    "                    param_grid = params, \n",
    "                    cv = 5,\n",
    "                    scoring = \"roc_auc\",\n",
    "                    verbose = 10,\n",
    "                    n_jobs = -1)\n",
    "\n",
    "# Fit GridSearchCV\n",
    "grid.fit(X_train, y_train)\n",
    "\n",
    "# Making prediction on test set\n",
    "y_pred = grid.predict_proba(X_test)[:,1]\n",
    "\n",
    "# Getting AUC\n",
    "auc = metrics.roc_auc_score(y_test, y_pred)\n",
    "\n",
    "# Print results\n",
    "print(f\"The AUC of the tuned Stacking classifier is {auc:.3f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "classifier_labels = [\"SVC\", \"MLP\", \"NuSVC\", \"RF\"]\n",
    "\n",
    "# Get all unique combinations of classifier with a set size greater than or equal to 2\n",
    "combo_classifiers = []\n",
    "for ii in range(2, len(classifier_labels)+1):\n",
    "    for subset in itertools.combinations(classifier_labels, ii):\n",
    "        combo_classifiers.append(subset)\n",
    "\n",
    "# Stack, tune, and evaluate stack of classifiers\n",
    "for combo in combo_classifiers:\n",
    "    # Get labels of classifier to create a stack\n",
    "    labels = list(combo)\n",
    "     \n",
    "    # Get classifiers\n",
    "    classifier_combo = []\n",
    "    for ii in range(len(labels)):\n",
    "        label = classifier_labels[ii]\n",
    "        classifier = classifiers[label]\n",
    "        classifier_combo.append(classifier)\n",
    "         \n",
    "    # Initializing the StackingCV classifier\n",
    "    sclf = StackingCVClassifier(classifiers = classifier_combo,\n",
    "                                shuffle = False,\n",
    "                                use_probas = True,\n",
    "                                cv = 5,\n",
    "                                meta_classifier = SVC(probability = True),\n",
    "                                n_jobs = -1)\n",
    "    \n",
    "    # Initialize GridSearchCV\n",
    "    grid = GridSearchCV(estimator = sclf, \n",
    "                        param_grid = params, \n",
    "                        cv = 5,\n",
    "                        scoring = \"roc_auc\",\n",
    "                        verbose = 0,\n",
    "                        n_jobs = -1)\n",
    "    \n",
    "    # Fit GridSearchCV\n",
    "    grid.fit(X_train, y_train)\n",
    "    \n",
    "    # Making prediction on test set\n",
    "    y_pred = grid.predict_proba(X_test)[:,1]\n",
    "    \n",
    "    # Getting AUC\n",
    "    auc = metrics.roc_auc_score(y_test, y_pred)\n",
    "    \n",
    "    # Print results\n",
    "    print(f\"AUC of stack {combo}: {auc:.3f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
